Improved Emotion Recognition with Novel Global Utterance-level Features
نویسندگان
چکیده
Traditional features, which are extracted from each frame, can not reflect the dynamic characteristics of emotion speech signal accurately. To solve this problem, first, without dividing the emotion speech into frames, novel global utterance-level features are proposed with multi-scale optimal wavelet packet decomposition; then, considering the case of little training samples, a fusion strategy through metric learning, which is called weak metric learning in this work, is proposed for fusing the global and traditional features. The experimental results with LIBSVM show that fusing the novel global feature to traditional feature achieves significant improvements about 5.2% to 13.6% than merely using local utterance-level features.
منابع مشابه
Improving emotion recognition using class-level spectral features
Traditional approaches to automatic emotion recognition from speech typically make use of utterance level prosodic features. Still, a great deal of useful information about expressivity and emotion can be gained from segmental spectral features, which provide a more detailed description of the speech signal, or from measurements from specific regions of the utterance, such as the stressed vowel...
متن کاملClass-level spectral features for emotion recognition
The most common approaches to automatic emotion recognition rely on utterance level prosodic features. Recent studies have shown that utterance level statistics of segmental spectral features also contain rich information about expressivity and emotion. In our work we introduce a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed ov...
متن کاملFusion of global statistical and segmental spectral features for speech emotion recognition
Speech emotion recognition is an interesting and challenging speech technology, which can be applied to broad areas. In this paper, we propose to fuse the global statistical and segmental spectral features at the decision level for speech emotion recognition. Each emotional utterance is individually scored by two recognition systems, the global statistics-based and segmental spectrum-based syst...
متن کاملImproved Frame Level Features and SVM Supervectors Approach for the Recogniton of Emotional States from Speech: Application to categorical and dimensional states
The purpose of speech emotion recognition system is to classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral and happiness. Speech features that are commonly used in speech emotion recognition (SER) rely on global utterance level prosodic features. In our work, we evaluate the impact of frame-level feature extraction. The speech samples are fro...
متن کاملIncremental emotion recognition
Most emotion recognition systems do not perform real-time emotion recognition due to latencies caused by phrase segmentation and resource-intensive feature acquisition, etc. To address this issue, we present an emotion recognition approach that can estimate speaker emotions with much lower latency. The proposed approach does not rely on phrase-level features to recognize speaker emotion; rather...
متن کامل